1 Dataset description

There are two submissions: 10267 & 10270.

  • In each submission, 2390 families with .vcf files are included.
  • For each family, two vcf files are provided,
    • one named “sorted”.
    • the other named “annotated”.

1.1 Submission 10267

  • For files named “sorted”,
    • 852 families without GL/PL information
    • 1537 families with valid GL/PL information
      • 333 Trios
      • 1204 Quads
  • For files named “annotated”,
    • 1096 families without GL/PL information
    • 1293 families with valid GL/PL information
      • 309 Trios
      • 984 families with Quads

Note that for FID:13562, there is no father information in the .vcf file. Also, all families with valid GL/PL information from files named “annotated” are included from files named in “sorted”.



1.2 Submission 10270

  • For files named “sorted”, there is no GL/PL information.
  • For files names “annotated”,
    • 596 families without valid GL/PL information
      • including 13 families with variants < 2000.
    • 1794 families with valid GL/PL information
      • 429 Trios
      • 1365 Quads


1.3 Combined

Note that combing 10267 & 10270, there are 2206 families with complete vcf files which were be used for further DNM analyses.

  • 526 Trios
  • 1680 Quads
  • 1145 females, 2639 males and 102 with unknown sex information.
  • corresponding to 2206 probands and 1680 unaffected siblings
    • Probands: 282 females and 1869 males, 55 unknown sex information
    • Siblings: 863 females and 770 males, 47 unknown sex information


2 Call de novo mutations

Triodenovo was used to call de novo mutations:

  • Only variants with GL/PL information were retained.
  • Families were splitted to Parents-Offspring trios.
  • Filters: --minDP 7 --minDepth 10 and other default options
  • Post filters (referred to Homsy et al. 2015 Science):
    • For offsprings: a minimum 10 total reads, 5 alternate allele reads, and a minimum 20% alternate allele ratio if alternate allele reads ≥10 or, if alternate allele reads is <10, a minimum 28% alternate ratio
    • For parents: a minimum depth of 10 reference reads and alternate allele ratio <3.5%

The scripts are stored in /scratch/90days/uqywan67/auti_proj/SSC/scripts/call_deno.R




3 Annotation

  • ANNOVAR was used to annotate refGene and allele frequencies.
    • hg19refGene, exac03nonpsy, gnomad_exome211 databases were used.
    • Based on annotation, further filtered DNMS:
      • exonic or canonical splice-site variant
      • MAF <= 0.001 in non-psychiatric subsets of ExAC (Header: ExAC_nonpsych_ALL in ANNOVAR), and in control samples of gnoMad databases (Header: controls_AF_popmax in ANNOVAR).
  • Gene-level pLI for PTVs was downloaded from ExAC
  • MPC scores for missense variants were annotated using VEP.

3.1 DNMs summary

After applying filters, a total of 4202 DNMs were found in 1768 families with 2456 offsprings.

  • 3412/4202 (81.2%) DNMs were the same with published SSC DNMs from Krumm et al. 2015 and Iossifov et al. 2014.
  • 345 Trios (with 586 DNMs) and 1423 Quads (with 3616 DNMs, including 1741 DNMs in 1058 probands and 1875 DNMs in 1053 siblings).
  • 2781 DNMs in 1651 males, 1307 DNMs in 740 females and 114 DNMs in 65 individuals with unknown sex information.
  • 2327 DNMs in 1403 probands and 1875 DNMs in 1053 siblings.
  • 2831 DNMs were not presented in ExAC, 2905 DNMs were not presented in gnoMad, 2614 DNMs were not presented in both datasets.

3.1.1 DNM counts

Note that a cutoff 10 were used to exclude individuals with DNM counts > 7, which corresponding to 99% quantiles.



3.1.2 All DNMs



3.1.3 pLIs for PTVs



3.1.4 MPC scores for missense variants

3.2 DNMs in Quads

  • A total of 3481 DNMs were observed in 1411 Quads
    • 1719 DNMs in 1048 probands and 1841 DNMs in 1039 siblings
    • 3115 DNMs in 1834 males and 445 DNMs in 253 females.

3.2.1 DNM counts



3.2.2 All DNMs



3.2.3 pLIs for PTVs



3.2.4 MPC scores for missense variants



4 Burden analysis

As noted above, a total of 526 Trios and 1680 Quads were used for further DNM analyses, which corresponding to 2206 probands and 1680 siblings in total.

  • At least one DNM event was observed in 1403/2206 (63.6%) probands and 1053/1680 (62.7%) unaffected siblings.
  • The DNM counts per individual were 1.05 (2327/2206) and 1.11 (1875/1680) for probands and siblings, respectively.
    • For missense variants, there were 1.05 (2327/2206) DNMs per sample in probands and 1.11 (1875/1680) DNMs per sample in siblings.
    • For PTVs, there were 0.10 (210/2206) DNMs per sample in probands and 0.06 (102/1680) DNMs per sample in siblings.
    • For synonymous variants, there were 0.27 (594/2206) DNMs per sample in probands and 0.29 (483/1680) DNMs per sample in siblings.

Binomial exact test (two-sided) was used to assess the burden of DNMs between different groups.

  • binom.test(x, n, p, alternative = “two.sided”).
  • e.g. when applied to probands VS siblings, where we set x, the number of successes, to the number of proband variants; n, the number of trials, to the total number of proband and sibling variants; and p, the hypothesized probability of success, to the fraction of individuals that are probands.

Summary of significant findings in burden analyses

  • Finding 1: There is a 1.57-fold enrichment of de novo PTVs (210 in 2206 probands versus 102 in 1680 siblings; 0.095 versus 0.061 variants per sample; p = 1.56E-4).
  • Finding 2: There is a 4.08-fold enrichment of pLI tier [0.995, ∞) (59 in 2206 probands versus 11 in 1680 siblings; 0.017 versus 0.007 variants per sample; p = 1.36E-6).
  • Finding 3: There is a 1.45-fold enrichment of MPC tier [2, ∞) (101 in 2206 probands versus 53 in 1680 siblings; 0.046 versus 0.031 variants per sample; p = 0.03).
  • Finding 4: In each RRB cluster, there is a significant enrichment of de novo PTVs in probands versus in siblings, but only Cluster2 remained significant after bonferroni correction
  • Finding 5: In each RRB cluster, there is a significant enrichment of pLI tier [0.995, ∞) in probands versus in siblings.
  • Finding 6: There is a 2.00-fold enrichment of MPC tier [2, ∞) between RRB clusters (41 in 594 Cluster1 samples versus 34 in 988 Cluster2 samples; 0.069 versus 0.034 variants per sample; p=2.74E-3)
  • Finding 7: There is a 2.19-fold enrichment of MPC tier [2, ∞) in RRB cluster1 (41 in 594 Cluster1 probands versus 53 in 1680 siblings samples; 0.069 versus 0.032 variants per sample; p=2.36E-3)

4.1 All DNMs

4.1.1 Probands VS Siblings



4.1.2 Females VS Males



4.2 pLI tiers



4.3 MPC tiers



4.4 RRB cluster

4.4.1 All DNMs

4.4.2 pLI tiers

4.4.3 MPC tiers